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I. INTRODUCTION 


Software naintenance now accounts tor a large percentage 
cf any software system's life-cycle cost. In view of this, 
the software industry has sbifted its emphasis with respect 
tc program evaluation. No longer is software being judged 
solely on tre merits of its applicatility to a given 
prc bh Wem). while nct neglecting the importance of this, the 
industry is considering factors which affect software 
raintenance as well. One such factor is software 
understandatility [Ref. 1]. 

Gaining an uncerstandine cf unfamiliar prcgrams is 
frequently cited ty researchers as the first and often most 
costly ster in software maintenance. This understanding is 
achieved when the programmer has “learned” all that is 
recessary tc ccmpetently carry out the required maintenance 
task. Making software easier to understand wovld have 
Significant long term advantages resvlting in reduced life- 
cycle. costs. This study presents a theoretical model of 
cognitive processes, tased on cbserved prograrrer behavicr, 
which aids in acqviring this understanding. Further, the 
Study contends that the effectiveness of these processes is 
deperdent upon the extent of the prograrrerís knowledge 


Dase. 


Most cognitive research analysing prcgrarrer behavior 
supports the iaea of levels of skill or ability, anā 
categorizes programmers as either novice, experienced, or 
expert. Rased cn the proposed theoretical model, this 
ability is defined ty how well the processés ares devetcoped 
by the pregrammer, and the extent of his cr her knowledge 
tese. 

À novice has a relatively limited knowledge base. 
Consequently, there is very little develcprent of the 
ccgnitive processes in evidence. He or she is considered 
primarily à Peanrne hr, using rainly unsophisticated 
techniques, such as inductive reasoning, to gain an 
understanding of e programr. 

Án experienced  prcgrammer has a fairly extensive 
kncwledge base. It includes inforration abcut mest of the 
kncwledge domains necessary for program understanding. The 
depth of informaticn in these domains is, however, uneven. 
Ey this it is meant that an experienced programmer ray know 
algorithms to perform a certain function, for exarple to 
sort nunters, tut may find it ditiicurt, to adatto N Ee on 
these to sort words. Or, in the category cf prograrming 
larguages, he or she may be farniliar with the syntax and 
semantics, brut unsure of vhe underlying design and its 


effects on a program. 


Altbough still learning, the prirary erphasis at this 
stage of a rcgrammer's growth is the development of 
cognitive processes which make efficient use of this 
kncwledge. At this stage, the programmer’s perforrance is 
good, though inconsistent, over a spectrum of less difficult 
tasks. It does, however, degrade rapidly as tesk difficulty 
increases, indicative of only partially developed processes 
and the uneven knowledge base. 

An expert, on the other hand, has acquired a broad 
kncwledge base, including Many specitics atcut programming 
larguages and design, algorithms and data structures, task 
domeins, €tc., es well as how they relate to one arother. 
He or she bas a consistently high level ot rerforrance as 
Me Pre pOrtional vo task difficulty. This “results “rom a 
demonsrrated use ot well developed ccgnitive processes. 

These prccesses, which make use of the knowledge base, 
in conjunction with external information (program text, 
docurentation, ¡problem specifications, etc.), enhance the 
expert/s ability to gain an in-depth understanding of the 
scftware involved in a given maintenance task. It is this 
demonstrated capebility thet distingvishes the expert from 
Elie nh a pOViCce OT experienced programmer. 

Acknowledging this, the choice for this study is to 
rodel an expert involved in the task of understanding an 
Uns Dppceram in order to perform somé  tUype of 


raintenance. What these processes are, how they are used, 


E 


and what intcrmation is contained in the knowledge base, 
rorr the major portions of this fmodel. Realizing the 
subjective natvre of the study, it is not a clair that this 
is a definitive model. It is, hcwever, reascnable and 
representative of prograrrer behavior deronstrated by 
experts. In fact, this study conterds that it is this very 
behavior of making efficient use of these processes which 


determines expertise in this area. 


1¢ 


fie ON Y and REGALE 


We Know empirically that information is remerbered-- 
SUCGede In the bprpaibnssand Can we recalled. Most evidence 
also supports the hypothesis that human mercry is at least 
tartly associative [Ref. 2]. By this it is reant that 
facts, Events, concepts, and other types of information are 
encoded and stored in memory as separate elements cr sets of 
elerents, connected tc one another ty means of association. 
Each elerent is stored only once, tut cen beve any number of 
associations with other elements. Each elerent is also 
directly accessible. One nethod of knowledge representation 
which incorporates Many of the concepts and properties 
associated with this type cf memory is the semantic net. 

às tbere is no evidence that strongly supports any 
theory yet proposed to explain how memory and recall are 
accorplisred, it should be noted that the model proposed 
here uses semantic nets only as a tool. The ideas of 
semantic nets will aid in explaining certain cognitive 
prccesses. Bcwever, the model itself has teen developed 
based on research data and its validity is independent of 
this or any other theory regarding how these rudimentary 
Ceretraw functions, memory and recali, are® accomplished. 

Menory is conmonly thought of as having two parts or 


areas. These are labeled long Term Memory and Short Term 
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Memory. This may not be a physical division, though some 
researchers suggest that they're located in different areas 
of the brain, tut rather one of cognition. Sone researchers 
also include a third area, Working Memcry. As the validity 
of this additional division of memory is not critical to the 
Todel, tne sirpler idea is adcpted. A final forr of 


“nerory”, called External Memory, is also used. 


A. SEMANTIC NETS 

A semantic net is a directed graph made up of nodes, 
representing otjects, connected to one ancther via links. 
These links indicate specific relationships or associations 
between nodes. This representation of Knowledge is very 
popular arong members of the Artificial Intelligence 
community. As there is no definitive set Of characteristics 
fer a serantic net, thcse relevant tc the model proposed 
nere arce dqESCHRBECdE Much of this information is taken fror 
a text by WINSTON [Ref. 3], whose descripticn seers standard 
when compared to others in the literature. Froperties have 
been added or altered, however, to aid in explaining certain 
behaviors cf expert programmers. It is emphasized again 
that the model is tased on observed rtehavior, and in no way 
depends on the validity cf this presentaticn of semantic 
nets, or any other knowledge representation. 

Three terms ere used here to describe semantic nets. 


The objects of the ret are called nodes and the relations 


petmeen cbjects are called links. They are represented in 
thes figures bye lame lea circeles.end arrows  respectiwely. A 
hind tenm used ty WINSTON, «which is ess standard, is the 
sick. The slots of e rode are the aifferent nared links 
cera tine atugtbe node. AD exemple whenbt serve here ‘to 
better describe the use cf these terms. 

Ir Figure 1, we have an eiample of a semantic net. The 
ICO DECIS arg ORR wiich is a specific car; CAR »hich is 
a general atstrection, DOUG and JILL which represent 
SHE CARAC peoples and the otject BIUE. There is an OWNED-BY 
ling between CAR27 and DCUG, end téetween CARL? and JILL. 
There is an IS-A link betweer CAR2Z7T end CAR, and there is a 
CCICR link between CARZ? and ELUE. CAREY has Tour links 


MAN it, tut only three slots., ¡he COFOR slot ás 





OWNED-BY OWNED-BY 





Figure i- A simple semantic net 
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filled with the value BIUE, the IS-A slot with the valve 
CAR, and the CWNED-BY slot with the values DCUG and JILL. 
Note that the objects do nct Lave to te tangible MS 
illvstrated by the object PLUE. Figvre 1 1s, otf course, a 
representation cf the knowledge that CARZ" is a blue car 


cwned by Love and Jill. 


TRANSPORT 


USED-FOR 


AKO 


HAS 


2 


COLOR PROPULSION 


IS- 


0 > 


OWNED- BY OWNED- BY 


COLOR 


e 


Figure £z - Inheritance in Semantic Nets 
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When CAR27 is thought of, many facts about it core to 
p geri It bas an engine, tires, and seats. Also, it is a 
vehicle used for transportation. oes this mean that, using 
cur representation, the object CAR27 shovld have direct 
Perks to the otjects "ENGINE, TIRE, SEAT, VEHIOTE, ana 
TRANSPCRTATION? The answer is no. The way this information 
is represented is thrcugh a property called inheritance, ana 
the use of frares. 

Inheritance is an object’s acquisition cf a slot value 
by ipheriting the value from another object through 
association, Figure 2 is a semantic net showing one 
representation of the atcve facts atout CARz". AS can be 
seen, CAR27 has no USED-FCR link, but does have an IS-A link 
to the rore abstract object, CAR. However, it also bas no 
USED-FOR link, tut is associated to the cbject VEHICLE 
throren an SKOT- A Kind Cf - Link. In tracing the net fror 
CAR27, VEHICLE is the first node reached which does have a 
USED-FCR slot value, TRANSPORTATICN. CAR27, therefore, 
inherits this velve through its indirect association with 
VEHICLE. 

Again looking at Figure 2, notice the object CAR is 
lirked to some fariliar characteristics of a car via FAS 
links. This area of the net, isolated in Figure 3, is 
called a FRAME. A frare is a set or cluster of objects 
whick serve as slot values for an abstract or less specific 


object. Its purpose is to group properties common to many 
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specific otjects: which are instances of thes abstraction. 
These prcperties cr slot values are then inherited by rhe 
rore specific instances, making the net less corplicated. 
Slots can te added tc or, although less- likely, 
suttracted fror a frare. This would occur due to additional 
infcrmaticn being inccrpcraved into the net. Fecause of the 
cyramics otf frares, they always represent the most current 


abstraction relative to the entire semantic net. 


HAS HAS 


COLOR : PROPULSION 


Figure Ó - A Frare 


A frame also serves to provide DEFAUIT values for 
incomplete pictures. Let's sey, for illustrative purposes, 
that one of the sicts of the frame representing CAR is the 
CCICR slot, and it is filled witt tbe value RED. Now 
further suppose another object CAR2E is introduced, but 


without AWCOLCOR Jing. Since ell cars must have sore 
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Speenic- color, CAR28 is-incomplete. To remedy this, it 
inherits the default value RED, until such tire as its own 
color 1s added to the knowledge base. 

Exceptions and unusual circumstances must also be 
accounted for. Using the CAP example again, suppose CAR2e 
is an experimental model using compressed air fcr power. 
The PROPULSION slot of the GAR frame is filled with whe 
value ENGINE, yet for CARZ8, this wovld be incorrect. Prior 
to knowing the method ot FROPULSION, it is ‘assumed’ that 
CARZE is powered by en engine. Once the method is known, 
however, a PRCFULSION link is added to CAR2E, reflecting the 
exception. Now, in trying to fill the FRCFULSICN slot for 
CAR2B, tre tirst value arrived at is COMPRESSED-AIR, the 
search Stops, and the frame slot value becores 
inconsequéntial. Figure 4 is the representative net. 

By this explanation, it may appear that all objects 
making up a frare are default values, and exceptions nothing 
rore than specific slot values in lieu cf the default. 
Bach, however, is subtly different. <A frame is rade up of 
attributes of an cbject. Some, Such as engine, tire, or 
seat, are common to the majority and as Such are not 
substitute values, used for lack of one more specific, but 
the sare value shered arong many objects. An exception is 
where particulars cf an cbject contradict any of these 
shared values. Cthers, such as color, are common attributes 


with possibly different values for each instance of the item 


1 


whose abstraction is represented. These are truly aefault 
values, whose purpose is to fill a void until rore specific 
information is ottainea. This information is not an 
exception to tne frame, but an expected piece of data 


previously missing or urkLnCcwn. 


TRANSPORT 
USED-FOR 
SEATS AKD 
HAS HAS 
COLOR PROPULSION 
RED IS-A IS-A 


PRSPUIP SUN 


COMPRES 'D 
AIR 


Figure 4 - Serantic Net with Excertion 
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Another quality of an assoclative memory is the ability 
to distinguish the correct usage of an object, through 
context or perspective, when nany different meanings exist. 
This aependency on context must also be represented in the 
net. Work cited ty COHEN supports tbe iaeå that objects 
each have many classificaticns, determined by context [Ref. 
4: Fp. S-1£]. This is tecause certain objects, when viewed 
tromedifferentowpersrecetives, take on new. or different 
quelities end attributes. A car, for example, can be looked 
at as an autorotile, or as a tcy, or as the car of a train. 
Obviously, cach will have aifferent attributes which ere 
idertified through context. The result is one-object with 


three distinct purjoses or aspects. 









PEDELED 


PROPULSION 





PERSPEC- 





RSPEC- 


TIVE _/ 


PROPULSION 


PE 
CAR 


Figure 5 - A Perspective Node Bundle 
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One way tc represent this in a semantic net is to view 
an object as a node bundle. This bundle consists of a 
general object ncde as well as a number of nodes each 
representing a different perspective for thet object. Links 
relevant to a particular context are associated with the 
corresponding perspective ncde. 

With such a representation, shown for CAR in Figure 5, 
slot valves are accessed either with or without a 
perspective. Say, for example, the size of CAR is needed. 
I? CAR is with reference tc a train the returned value would 
be quite a bit different than if the inquiry were rade for a 
toy Car. If nc ¡¿erspective is given, the node bundle 
collapses to the single CAP node used throughout this 
exanzple. This causes all possible slot values to be 
returned, €ach annotated with the associated perspective. 

This notion of node bundles and object classification 
leads to the idee of node clustering. Put simply, a node 
cluster is a grouping in the net of otjects and links 
strongly associated with one or two specific objects of the 
cluster. MINSKY uses a geographic analogy tc illustrate the 
idea [Reto £: EE He suggests picturing capitol 
cities with streets rowed by houses. These cities are 
ccnrected via major throughfares to smaller suburban cities, 
which are in turn connected to towns, etc. The analogy to 


clusters, objects, and links is readily apparent. 
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The implication of tnis analogy is thet semantic nets 
are organized kierarchically. If this idea iswacceprved, it 
follows thet in order to recall a certain piece of 
ioformation, several levels of tbe h$erarchicsSl Structure 
must be transited depending on tre point of entry. This 
walk through several levels necessarily has an adverse 
G@mtect cn the speed of recall. Yet; in some instances, 
informaticn which should be separeted by several levels is 
recalled faster than expected, implying en alternative 
method. To explain this, MINSKY introduces a second notion 
which allows for shortcuts through several levels. The 
argument is that if a certain path is reinforced a number of 
tires thbrovgb use, s direct link is forred, analogous to 
taking back roads to avcid lights and traffic. 

These properties of semantic nets reflect those of an 
asscciative mercry and will be referred to extensively 
thrcughcut the remainder of this paper. Letails will be 
added as necessary, to further explain behaviors, and this 
should make these semantic net properties clearer. However, 
[vice ir cCOrtant £O0r the reader to understand these ‘verTore 


proceeding. 
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E. SHORT TERM MEMORY 
Information enters the cognitive system through short 
term memory. CURTIS fRef. €] quite adequately describes 


this memory as: 


"a limited capacity workspace which holds and processes 
those itens of irformation currentiy under cur attention. 
This limited capecity was first quantified by MIIIER as 7+2 
items  [Retf. 7]. As will be seen later, an item is not 
limited to a single memory element, and may be a “chunk” of 

indefinite size. 

The inforration which exists in short term rerory is 
transient and must be constantly used or ‘rehearsed’ to 
prevent its rapid decay [Ref. 8]. If the information is 
gained via percertion, this rebearsal will, after a tire, 
fix the infcrmation in long term memcry. This is sometimes 
called tre learning process. If, on the other band, tbe 
information  beirg used was recalled from lcng term memory, 
this rehearsal serves to reinforce it. This reinforcement 
has a positive effect on tbe future recall of this 
information and may cause lit to migrate due to repetitive 
use. Both rapidity of recall and informaticn rigration are 


discussed later as they pertain to the model. 
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C. LONG TERM MEMORY 

When we learn or reroríize sorething, the information is 
retained dn long term rermory. When some event causes the 
recall of other events in the mind, the infcrmation comes 
from long term memory. It is tke reservoir of permanent 
kncwledge used in cognition, and has stored in it everything 
fror the spatial model of the world to the motor anda 
perceptual skills used moment to moment (Ref. 9S: pg. 5€]. 
Fut simply, it is the knowledge tase we operate from. 

Unlike short term merory, the capacity of long term 
rerory seems virtually unlimited. It receives and stores 
new informetion after processing in short term memory, and 
this information is directly accessitle, once stored. Also, 
research has shcwn that the knowledge in long term memory is 
organized, and that the Organization ray change almost 
instantaneously, based cn the context of the information 
being processed in short term memory. As will be seen 
later, this atility is significant in terms of the model, 
and will be discussed in more detail as it relates to an 


expert prcgranmer's knowledge base. 


D. EXTERNAL MEMCRY 

As an aid to information processing, external devices 
such as pencil and paper, chalkboards, and tape recorders 
are used to store information not in long term memory which 


the programmer wants readily available fcr reference. amis 


helps to compensate for the limited capacity of short term 
remory, and complements long term merory. All methods used 
for this purpcse are generally referred to as external 


remories. 


III. KNOWLEDGE EASE 


“Experts and novices ditfer in their abilities to process 
large arorrte ^or meaningful inforration....A common 
explanation of this difference is that exfjerts have not 
only more inforratior, they have the infcrmatícn better 
organizeéed...naking their perception more efficient and 
their recall performance much higher. [Ref. 1€] 

The above quote emphasizes the importance of both the 
conterts and the organization of the knowledge base. 
Included in the discussion presented here is the conviction 
tbat the contents of memory <orehow affect this 
organization. Also, based on data from several studies 


referenced, this organization is dynaric and dependent on 


context. 


A. CONTENTS 

Along witk basic knowledge, normally acquired through 
grade school and college, the expert prograrrer knows a 
greet deal about five major categories of knowledge 


associeted with pregramming. These are: 


ALGORITHMS 


i 


PROGRAMMING LANGUAGES 


LOGIC 


DATA STRUCTURES 


PROGRAMMING LTESIGN METHODOLOGIES 
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The depth ot knowledge in these categories allcws the expert 
tO quickly focus on the important aspects of new 
information. Using the processes covered in the next 
chapter, he or she can then encode this information and 
relate it to what is already in long term memcry. 

Experts are familiar with many algorithms whick do 
essentially the same jcb. Associated with each in the 
knowledge base is a set of benefits, drawbacks, 
applications, and, either implicitly or explicitly, a 
conplexity evaluation. Choosing integer sorting as a 
representative task, there are several options: Merge Sort, 
Comparison Sort, Radix Sort, and Quick Sort to nare a few. 
Fach is useful in accomplishing the sort, however, each is 
alsc especially suited to certain applications. Each also 
has variations which are applicatle to other tyres of sorts. 
The expert is feriliar with these, as well as the underlying 
principles which differentiate them trom one another. This 
allows him or rer to readily adapt these algorithms to meet 
different needs, lexicographic sorting for instance. 

Like elgorithms, data structures have many variations. 
The expert is familiar with these and with the underlying 
principles behind their design as well. This allows easy 
modification to meet new requirements and aids the expert in 
reccerizing design flaws such as lack of flexibility or 
expandability. The expert also has Knowledge of algorithms 


and can correlate a given data structure with an algorithm 
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oer oup ot algorithms for a specific application. The 
expert can also relate infcrmation or programming languages 
to data structures, evaluating the relative ease with which 
specific structures can te used and manipulated. 

Prograrring languages are , to some degree, familiar to 
all prcgrarmers, whatever their skill level. An expert, 
however, idis not only versed in the syntax and serantics of 
several languages. He or she is also familiar with the 
advantages and disadvantages of one language design, or 
particular machine implementation, over another. While the 
choice of language is nct an option for the progremmer 
tasked with maintaining or debugging, the particular design 
and implementation features play an impcrtant role when 
porting sottware tror one machine to another. 

Knowledge of language design and implementation also 
éllows the expert to make  judgements atout Software 
efficiency and memory needs. This knowledge alsc allows for 
identifying portenta trouble Spots, usually avoiding 
analysis of the entire prcgram. This is particularly 
important when evaluating possitle ertécts of a 
modification. 

Information atout algorithms also contributes to the 
knowledge of languages. As most languages have built-in 
functions, the expert can evaluate the particular algorithms 
used to implement these. This evaluation adds to his or her 


knowledge base ot programming languages, aids in efficiency 


analyses, and is useful in predicting the accuracy of 
results. Supported by this knowledge, an expert may choose 
to substitute cther routines using more applicable 
algorithms, tor such things as increased accuracy in 
calculaticns, mcre efficient device drivers, cr faster 
access to secondary storage. He or she might also choose to 
replace programmed functions with ones tuilt into the 
larguage, for the sare reasons. 

Kncwledge regarding logic is important in two ways. 
First, it enables the expert to learn the specific 
implementation of control staterents in e prograrring 
language, adding this to his or her knowledge base. Second, 
it aids in eveluating the flow of ccntrol in a given piece 
of software. Both help in analysing the etficiency of the 
software. Taking the following IF-THEN statenent: 

IF. A >10) OR (B < 15 ) TREN O0 =D 
the expert would know, or could test, whether or not the 
second comparison is executed independent cf the result of 
the first. Taking advantage of this type of information 
could greatly impact the software’s efficiency, saving money 
and C PUREE 

Programming design methodolicgies are treated differently 
fror other categories in the knowledge base. They can not 
be defined in specific terms, as we have done with the 
others, and are seen as rore of a gestalt type of knowledge. 


They nelp the expert in analysing possitle side effects, 
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whieh is; invepart,— a fumction of modvlerity-. They play a 
majer role in processes to be presented ater, such as 
CHUNKING, SLICING, and HYFCTHESIZING. 

Aside from kncwledge of programming, the expert 
maintainer must also know something of the specitlc 
applicaticn area. The level cr amount cf information 
recessary is dependent upon the modification to be 
implemented. At tne very least, however, the programrer 
needs to know erough to be able to interpret the 
dccumentation and program specifications in order to make a 
judgement regarding potential side effects of the change. 
This intormation is either learned informaticn in long terr 
merory, which can be recalled tor future tasks, transient 
lInformatior used and then forgotten, or infcrmation kep4 as 
reference using an external memory. 

The view of this study is that what is contained in the 
Krowledge base directly aftects the programmer’s ability to 
understand a given piece of software. Otviously, what the 
prcgrammer knows at the cutset abcut the program's task 
dorain, end information related to iv, will impact on his or 
her difficulty in gaining this understanding. Extending 
this idea, e large disparity in the knowledge level 
Sipmueeocantiy affects the level cf corrpeténce of the 
prcegrammrer and, consequently, the relative quality of the 


work. 
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The cognitive processes which interact with this 
kncwledge base, in order for the prcgrammer tc achieve this 
understanding, perform essentially three functions. Factual 
information is analysed ard added tc the knowledge base, or 
concepts and methodologies are atstracted from 
documentation, or information from one category is 
associated with that from another (such as correlating a 
data structure with an algorithm). These functions serve to 
integrete all information available to the programmer 
applicable to the task. 

This knowledge tase is not simply a collection of facts. 
It is the organized accunulation of information into a 
network reflecting semantic associations. This organization 


is equally es importart as the information itself. 


B. ORGANIZATION 

Studies of recall show that people tend to organize 
information into categories and groupings. Most items or 
objects in memory are mepbers of rore tban one of these 
categories derendenUu onsdconte rte A piano is a member of 
the musical instrunent category, and can be sub-categorized 
as a keyboard instrurent in the context of musical 
instrurents. It is also a member of the category which 
includes  hutcb end dresser when viewed as a heavy piece of 


furniture: 


og 


Grovping ty order is another otserved way memory has 
teen organized. A yerson asked to list the ingredients of a 
recipe, for example, will more than likely list therm in 
order of their use. When asked to list items necessary to 
eqvip a home, housewives listed these items either by 
Gategory--kltcben vterslls, furniture, window  coverinpgs--or 
by considering necessary items room ty room [Ref. 4: pp. €- 
Me 

The evidence provided by these studies support the 
hypothesis that memory is organized dynaricelly, based on 
the context of tbe stirulus. It also implies that this 
Orgenization makes use Of information clustering. What is 
meant here is that information elements releted ty context 
‘migrate’ toward certain key elements or toward one another. 
In eitber case, this clustering strengthens associations in 
context between these information elements, enhancing 
pecall. As explained in a later chapter, this enhancement 
aids cognition ty raking pertinent intcrmration readily 
available to short terr rercry, while ‘blocking’ irrelevant 
essociations irvolving these same elerents. 

Fecause these groupings are determined by context, the 
arcunt of  dntormation contained in the knowledge base 
associated with each element has a tearing on their 
categorization. The greater tke erount of associated 
Knowledge, the more refined the groupings can be. As rore 


kncwledge is gained and this refinement ccntinues, new 


clusters are formed to replace those less refined, and the 
association between any two becores rore specific. This, in 
turn, resvlts in a reorganization of memory. 

The studies cited here involve sirple element lists. 
However, this idea is easily extended to more complex 
inforration elements, such as concerts, ideas, and 
abstractions, which are themselves clusters of information. 
The implication throughout this chapter is that different 
knowledge categories or dorains ere used best when 
integrated. Fow the contents and organization of memory 
relates specifically to the expert, and how this integration 


is accomplished, is addressed ln tre following charter. 


IV. THE PROCESSES 


SCHNEIDERMAN and MAYER conjecture that, to facilitate 

program comprehension: 

“the programmer, with the aia ot his or her syntactic 

knowledge otf the language, constructs a multileveled 

internal sementic structure to represent the prograr. 

[Ref. 11] 
The present stvdy has identified, in the context of softwere 
raintenance, three major complementary cognitive processes, 
Supported by certain lesser ones, used to eccomplish this. 
Further, it is the tenet of the study that the entire 
prcgram need not te represented in memory, but only that 
part whick is of interest as determined by the progranmrer. 

The descriptions of these processes have been forrulated 

from observed rrogramrer behavior. The idees presented ere 
extensions cf theories based on empirical data resulting 
frcmr limited testing. Introduction and subsequent treatrent 
cf trese ideas in the literature has been, in many cases, 
artfvlly vague, with researchers characteristically relying 
on intuitive understanding throveh exarple. Therefore, 
elthough an attempt is made here to rore clearly define 
these rrocesses, the next chapter presents a scenario 


exerplifying the application ot each. 
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A. CHUNKING 

The cognitive process known es 'chunking' is a learned 
skill, enabling a prcgrammer to encode information in such a 
way that a group of information elerents can te represented 
and processed as a single element in short term memory 
ÍRef. 7]. As mentioned previously, short term memory is 
where information processing occurs, and is characterized as 
having a limited capacity. This grouping or organizing of 
information allows programners to operate cn ‘chunks’ of 
associated infcrmation rather tban single items. This 
translates to giving the programrer a troader perspective of 
the task. 

Chunking is avery dynamic process, in terms of the 
knowledge base, A chunk is created when an association is 
termed between an encoded item in short term memory and its 
corresponding information cluster in long terr rercry. This 
cluster is the result of a reorganization of memory based on 
the context cf the stimulus which initiated the chunking 
ET UCESSK It can te added to or deleted frcr, based on the 
results of partial completion of the task fcr which it was 
created, or as inforraticn is learned, regarding the task, 
thrcugh other processes. 

Chunking associations ray also be forred between the 
enccded item and information in external rerories. These 
associations may access information directly, or right 


Simply gvide the progremmer to a reference in which the 


necessary information is contained. In either case, they 
allow the programmer the use of transient or task specific 
information. At the sare time, they alleviate the 
prograrrer of the burden of having to learn the information 
SO it might be added to the cluster, or of having to store 
it in short term rerory terore it is needed. 

The amount ct information represented by a chunk is 
artitrary  |Ref. 12). Its size is dependent on how much 
associated information is contained in the krowledge base, 
and to what extent external memories are used. The results 
of research by MIILER and others indicate that the number of 
items used or stored in short term remory is relatively 
Constant. From this it can be concluded that the number ofr 
chunks which can be processed is independent cf chunk size 
lir 1$- pg. 1775; Ref. 9: rg. 44]. Thus, chunking 
effectively increeses tbe capacity of short terr remory as 
relates to information processing. 

Resides having the ability vto handle more information in 
short terrm rercry, chunking also allows the programmer quick 
acees En tO SpEecCitic information which is part of the chunk. 
The reason is that chunks, representing information 
clusters, enhance recall of that informaticn. All Knowledge 
associated witt tbe chunk has effectively been accessed, ana 
cap be thcvgkt of as stagec for recall. This can best De 


explained by using a serantic net representetion. 
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When the chunk is created, a reorganization of the 
Kncwledge base takes place, and informraticn migrates, 
forming 32a high density node cluster. Again, the size of 
this cluster depends on the extent of the Knowledge base. 
This density décreases the length of nodal links, resulting 
in a shorter walk trom the initial access ncde or capital of 
the cluster to the desired “information ~elemernr. The 
asscciaticn between the encoded item and the knowledge base 
is one exemple of the “shortcut” described earlier, anda 
lirks short terr remory to the capital of the cluster. 

The perspective has also been identified and 
associations tetween nodes not in conter tine been 
aeenphasized. All the information represented by the chunk 
is ncw just beyond the prcgramrer’s conscicusness waiting to 
be recalled. The encoded item can therefore be rrocessed, 
representing a group of knowledge, with specific items 
associated with the chunk rapidly recalled for use when 
necessary. 

Sore researchers, such as KINTSCE, suggest that chunks, 
once formed, can be permanently stcred in long term memory 
(Ref. 1%: pg. 175]. This idea 1s inconsistent with the 
presentation here, and research for this study has uncovered 
no data to Surporvt the hypothesis. KINTSCH himself 
cifferentiates tetween what a chunk is in short and long 
term memory. His idea of stored chunks closely corresponds 


to the earlier rresentaticn of information clustering. As 
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poets tHe contention of this study that a chunk: exists only 
sc long as it is under the prograrrerís attention, this 


notion of permanently stored chunks is disregarded. 


EF. SLICING 

Expert programmers break large unfariliar programs into 
waer coherent pieces in order to “gain an understanding of 
their function and/or design. Often, these pieces are 
determined by the original writers of the ccde. They are 
identified as Clocks of code In the form of Subroutines, 
Precedures, tftnetions, and the like. Identification is 
usually explicit and the pieces are written into the source 
as contigvovs lines of program text. Cne can think of these 
as functicnal pieces of the program. 

Also, experts routinely pertition rrograms in ways that 
GomucercOonroOrm te textual, modular, cr functicnal structure, 
permitting multiplie views of tte sare codes Unlike 
furcticnal pieces, which have a one-tc-one correspondence 
between function and purpose of coce lines, this type of 
division allows lines of ccde to be viewed from different 
perspectives. This associates a single lire of code with 
rore than one purpose. Ehe Construction cf these piers is 
what WEISER, who first proposed the idea, cells ‘Program 
SU The process is used to strip from a program 
statements which do not influence a specific behavicr or 


slicing criterion. ine result is an abstract representation 


cf the [rcgrar as viewed tror the” persreciive sot the 
specific  behavicr. This group of statements, usually 
associated with a single variable, is called a program 
slice [Ref. 14: pg. 439, Ref. 15: pg. 446]. 

Slicing íis important in maintenance because typically 
cnly a subset of the prcegram’s behaviors is teing improved 
or replaced. Fy eliminating non-inflvential code, the 
rairtainer's jct is made simpler. He or she can then deal 
witk a much smaller “prograr’. While this program may not 
be syntactically ccrrect, it is semantically correct for the 
tehavior of interest. 

Also, the entire piece of software need not be sliced. 
If a point in the flow of control can be identified which 
bounds the slicing criterion, then only that part of the 
code still to te executed peed be criced: This further 
reduces the prcgranmer’s task. 

Two key areas of the knowledge tase are especially 
inflvential in determining the effectiveness of a 
ircgrarmer's slicing ability. Programming icgic allows the 
mainteiner to easily identify bounds of a specific behavior. 
He or she can, with an extensive knowledge base, trace 
through the prcgram's flcw of ccntrol easily and accurately, 
reccgnizing particular logic features of the language. 
Alsc, the expert’s in-aepth knowledge ot the programming 
language gives him or her the ability to readily identify 


lines of code which impéect the Slicing criterion. For 
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example, familiarity with hcw data is passed and whether or 
nct it is altered by code or simply used and returned 
without cbange (ie. Fass ty Reference, Value, or Nare) covid 
greatly affect the size cf the slice. 

The extent to which experts enploy sticing seems to 
depend on the [rcgerem. Testing by WEISER shows thet factors 
influencing tbe use cf slicing are ccde size, structure, ard 
ease of understanding [Ref. 1£: pp. 4£5£-4€1]. This suggests 
that slicing is found ty experts tc be most effective on 
pocrly structured programs, and less so or those which ere 
well designed and make use of modules, corrents, end 
mnemonics. Weeer nEss here is a relative measure of the 
amcunt of work eliminated and/or inforwetion gained by 
Slicing., 

The wcrk by WEISER also demcrstretes that expert 
prcgrarrers Micetommentiy develop o their cwn style of 
slicing. MMi moes Mot precicde teaching its principles tce 
less able prcgrarrers, but points out the process’ 
dependence on the kncwledge and experience ot Tre 
individual. It also points to the fact thet it is a 
sutjective prccess and cannot presently be ¡implemented 
fully. For the interested reader, however, WEISER does 
HeScHDEM3Lpscrithms tor -approwirmating slices and discusses 


the effectiveness of two automatic slicing tcols (Rer. 14]. 
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Oy) SHY POTBES TS “Pues 

The third, and perhaps West pCcweriul, |) goeecs seu 
experts is hypothesis generation, refinement, and 
veriticatlons It is a tcp-down process which allows fer 
raximum utilizatior cf the progremrer's knowledge base, the 
overall derth of which determines the etfectiveness of the 
process. It involves the generation, besed on inforration 
in the knowledge base, and subsequent refinement and 
verification of hypotheses regarding the pregremmer's 
suppositions atout how the code was designed and written. 
As more and more information etout the software is 
processed, a hierarchy ct these hypotheses is censtructed. 

This hierarchy is tuillt guasi depti- -fisit This is 
because a prcgramrer has a tendency to focus on one area, 
forming a cascade of refinerent hypotheses throvgnh several 
levels betcre shifting his cr her attention. The programmer 
does, hCwever, remain cognizant of tbe otber areas. 
Therefore, infcrmetion Encountered while refining the 
current area cf interest is often csed to Terme ese. 
relating tc these cther areas as well. 

The hierarchical structure can be thoveght of as defining 
levels cf understanding. The greater the depth, the mcre 
the programmer has refined his or ker understanding of the 
scftware. By building this hierarchy, the frogranrer is 
creating an internal representetion cf the program. 


independent ot any prograrring language. The goal or ideal 
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is that, at any level cf understanding, the vrogrammer 
should be able to produce e functionally equivelent proererm 
in any language that he or she is fariliar with. 

The title of the progran, or a succinct presentation of 
the task for which tbe software was written, verve lily 
suggests enough information for the progremrer to generate a 
hypothesis abouv the general flow oft tbe rrog rer. This 
hypothesis wovld incorporete expected input and output types 
with a corresponding class or group of possible data 
structures. It would also have classes ot algorithms and 
abstract logical constructs in its meke-up, with the 
programmer essentially forming an overview of how the 
program might work. Note that these ere classes and not 
srecific elements. 

AS more information about the pregrar is processed, 
these ideas are refined by generating other, rore specific 
hypotheses based on new, mcre focused expectations. As 
rentioned, a hierarchy would begin to tcrr, each level 
further refining the "expectations used to generate the 
hypotheses above. As each new level is forred, P" 
incorporates more information about the program. The result 
ls rore factual information in support of these hypotheses, 
and less suppcsition based on previous knowledge cf similar 
tasks. This is not to say that knowledge tase inforration 
is replaced ty that newly learned atcut the task. Father, 


facts about the probier are used to verify, whenever 


41 


poss voles the supposed lnforratuüxmn. Orly when a 
contradiction OCCULS is this inforretion replaced. 
Obviously, this rrccess is dependent on the  prcegrarmrmer ^s 
having seen Sirilar problers before. It seems appropriate, 
therefore, to digress fcr a moment to address this idea of 
sareness or enelogy. 

As was mentioned before, information in remory is 
crganized into grotps based on certain prarareters or 
corstraints. Hew, in fact, this grovping is accomplished, 
is still not understood; bowever it does occur: AS 
associations are virtually limitless, it seems legical to 
assure that grcupines ere es well. Similar troblems could 
therefore be grouped and an abstract set of circumstances 
forred to Encompass dominant chaerecteristics of the group. 
This idea is similar to that of a frame. Then, as preblems 
are introduced, they are compared against these dominant 
characteristics. If the characteristics match, the probler 
is considered analogous. 

As this matching process seers a remmoth task as 
presented, consider the reducticn cf work if these sets of 
circumstances were grouped by single characteristics, 
incorporating ccnfidence levels, Or .anothem nrewbod of 
rating, to distinguish rest from least dominant in the set. 
This would cause stronger and weaker associations, leading 
to the most prcbable set first, analcgcus tc an electron 


following the rath of least resistance. This type of 
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organization would greatly reduce the amount cf searching 
necessary to identify this class of situaticns. 

The tvenetfits of these analogies, when they exist, are 
taken advantage of in generating hypotheses. AS stated 
earlier, the rprcgrammer makes maximum use of his or her 
knowledge tase. This is accomplished by relying on 
previously learned information regarding a general solution 
already familiar tc him or her. In this case, the specifics 
Cf tne software solution need only te learned it and when 
they are needed and differ from thcse of the general one. 
This is a much redvrced task, relative to learning the entire 
sciution (or prcegram) when no such analogies exist in the 
knowledge base. 

Returning to the discussion of hypotheses. the 
hierarchical strvcture can be explained easily by once egein 
using a semantic net representation. tach hyrothesis can be 
thought of as a frare. Each slot valve of a frare would 
either te an information element or a frare itself, 
obvicusly more specific than the one whose siot 1t fills. 

Initially, ali fremes (hypctheses) would contain either 
default or normal values. As more informaticn 1s processed 
regarding tbe software, these valves would te confirmed Or 
replace These new values could te frares, reryresenting 
still more specific hypotheses. Ncrmel values, when 
Aa en replaced byaercertions specific  vñe. whe 


prcbler at hand. 


Each introduction of new inforreticn causes e 
reorganization of rerory due to the change in context. “Mais 
reorganization would rake use of confirred inforration, old 
or new, and may cause a change in default cr normal valves 
not yet verified. If this change in context cccurs at a low 
level of the hierarchy, the prograrrer's perspective will 
change only slightly. If, however, the change affects slot 
values in the tor levels, reorganization of a large subtree 
right occur, giving the prcgrammer a significantly different 
view of“ the protien: The view could also change if the 
rrograrrer chooses tO sbitt bis or her attert O Ete 
cverall view, tc amore retined hypcthesis, £ccusing then cn 
a subtree of the hierarchy. This wovld have the effect of 
emphasizing the details cortained in this subtree and 
“chunking” tre remainder. The hypothesis hierarchy is 
therefore dynamic, changing with every shift in ccntext. 

Verificaticn cen take place et eny time. It usually 
Occurs when the programmer reaches é& level cf urdéerstanding 
abcut the behavior cf the prcgram that he cr she wishes to 
corfirm: This cen be because the prograrrer has reached a 
level of understanding telieved adeguate for the task he or 
she needs to perform, cr it Wight simply te "te =valigere 
certain FENpotreses: tetfore continuis Ore reason for 
intermediete validation is that it lessens the effects of 


discovering an invalid hypothesis or contradiction. 
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Nerea tien, thë hypotheses forming the leaves cf 
the tree are tested egalnst the code. Twc conditions are 
necessary for verification cf the hierarchy. HN. sede 
corresponding to the hypothesis teing verified must be in 
the prcgram. Second A LL code must be acecunted £crebj ame 
0f the bypotheses. If either of these conditioms faiis, tke 
Structure iSm reorgemized to reflect this ənd any wey 


imsormaticn gained from it. 
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V. SCENARIO 


A scenario is now presented to belr eremrlip Gow “each 
EIrccess applies to the task of program comprehensmor c NNUS 
reant tc give the reeder en intuitive understerdineg of 
apriication and” effects, as well as  Tth€ rec hani ones 
underlying these ccgnitive processes. The reader shovla 
also gain an understanding of the interrelaticnships between 
the processes, the krowledge base, and infornation relating 
Specitically MET DUDEN EROS IDE it 1s the ceblrectiwemece TOS 
these whick gives the expert his or her superior skills. 
Iker simplicity, a structured prcgram 1s assumed as well as 
en ALGCL-like programming langrvege. Agair, semantic nets 
are used to represent mercry organization. 

The program used for this scenario vill te one which 
computes averages cf student grades and outputs a letter 
grade for each. It is a fairly structured prograr with 
adequate documentaticn and uses rremonics but nc comments in 


the source code. 
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A. A WAIK-THRCUGH 

Suppose a programmer is given a rrogrem that he or ske 
has newer seen tetfcre ana asked to pertorr scre modification 
fao ud t. Further suppcse that to dc this ncdification, an 
overall understanding of the progrem is necessary. He or 
she mcst likely begins by lcoking at the documentation. 

After reading a small part of tbe documentation, perbeaps 
TOR Ee Cr Sentence, the profTtarmer forms a hypcthesis. We 
or she ras assertained that the program averages student 
grades. This defines a context, and a recreanizaticn cf 
memory takes place. This reorganization results le a large 
irfcrmaticn cluster; fcrring e frame. It ccntains sheets 
SUSAN PUTITAS OUTPUT DATA, and PROCESSES. 

mae value of the INPUT IATA. slot, based on the 
programmer's knowledge ot how school grades are arrived at, 
is a cluster of possible types cr classes cf data. These 
would include, at this level, every type of data in bis or 
her knowldege tase that tbe programner associates with 
schcol gredes, as well as all pessible deta structures 
associated awaetbe ther. the values Of The cthem sekots wed 
be of a similar nature. 

So by simply reading a single phrase, ‘ccrputes student 
grade averages’, the programmer has constructed ar internal 
representation of the program. He or she expects that it 
takes sore input data, processes this data, and cutputs the 


result. In addition, he or she has identified en input 
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dorain, an output dorain, and a domain ORAR On 
which the processing cf the data is assured based. While 
this is certainly not specific enough a representation of 
the software to enable the programmer to do any useful work, 
a level of understanding has been achieved. 

Further reading of the documentation reveals that €ach 
studentís grades will be read in, sunred, and the average 
converted to a letter grade and stored. This infermation 
suggests many, more specific, data and algcritbric classes, 
and several levels of hypctheses are formulated. Presuring 
that, at this point, the rrogramrer tegins to develop 
hypetheses in a quasi depth-first order, focusing on input, 
one hypothesis wovld be that grades are read in as numbers. 
Ancther might be thet each student's identificaticn is input 
in conjunction with his or her grades. The grade date 
hypothesis is then refined, fcrming a lower level hypothesis 
that grades will be represented as integers end handled es a 
list. Note that at this point, the proerarrer is not 
interested in what representation is used fcr student 
identification, possibly because hypotheses about the 
Erccessing of the data suggest that the icentificaticn data 
will be used but not altered, so specific typing will not te 
necessary. 

In memory, each hypothesis is represented es a frare 
with ordered slots. This ordering, if relevant, is based on 


the expected or confirmed ordering of the representative 
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Poronmmat ion in tbe progra, otherwise it is arbivrüpPy. “for 
exarple, the ordering of algorithms would te irportant in 
understanding the program, whereas the ordering o% data 
classes in the frəres created from tbe input hypotheses, for 
example the one representing the hypothesis that both grades 
ard student identification are input, is nct irrortemt" for 
program understanding. If subsequent analysis reveals that 
a Specific ordering is necessary, the frere would be 
reorganized to reflect tbls, because of'the new ccntext. 

The value of each slot iS an infcrmaticr cluster 
representing a knowledge domain, as frames representing 
hypotheses use classes of informaticn and net specific 
elerents. The cluster is formed tesed on the context 
defined by the hypothesis which the frare Or slot 
represents. The initial hypothesis” INPUT slot has, asa 
value, a cluster representing all data types or classes that 
the programmer associates with grades. When the subsequent 
hypotheses are formed, defining the input es STUDENT IDENT 
and GRADE, ttis cluster is ré€crganized into a two slot 
frame, @€ach representing a sub-cluster of the criginal. “he 
value of the STUDENT IDENT slot tecores all possible 
representations by which students can be identified, and the 
value of the GRADE slot becomes the cluster or all possitile 
classes cf grade representation contained in the knowledge 
base. Any elements or nodes of the original cluster not 


associated with either of these new clusters is net 
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“visible” from this frare down, similar to the idea of 
scoping in some prograrning languages. Sc on one level, 
there is a single cluster representing the bypothesis as a 
grouping of all possitie input data classes, while on 
another level, this same information, or a subdset of it, is 
viewed as two separate clusters. This reorganization of 
information occurs because cf the change in ccntext when the 
sutsidiary hypotheses are introduced. 

The programmer has now increased his or her 
understanding of the rrcegrar. In aadition to what was 
expected | based on the criginal hypothesis, the rrogramrer 
now also expects that: 

- grades are nurerical 

- each student/s set of grades is processed separately 
- the grades are initially input into a list structure 
- the grades are surred and averaged 


- each student is identified with his or her grades 


a merping takes place from averége to letter 


student IL and corresponding letter grade is stored 

Figure € shows this representation fccusing on the input 
suttree of tbe byrothesis bhlierarchys Feach level can be 
thcught of as a level of understanding. It should be noted 
that, at this point, no verification has taken place and 


this level of vnderstending is contingent on the correctness 
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of the bypotheses forred. However, this understanding is 
not appreciatly diminished unless the erronecus hypothesis 
is located in a tor level of the hierarchy. 

Comtinuins toOprocus ob input, in order to Verify this 
representation the programmer needs to SE the source code 
using input tehavior as tbe criterion. Then, ‘eaeh Wimeno fr 
code in the slice must be mapped to a. leaf-frame or slot of 
the input subtree. Note that these leaf-frares or slots do 


pot all have to te on the sare level. 


TAKES INPUT 
PROCESSES AND 
DUTPUTS RESULTS 


INPUT DATA PROGRAM 
ASSOCIATED WITH AVERAGES 
STUDENT GRADES GRADES 





STUDENT STUDENT 
ID GRADES 
DATA 





LIST INTEGER 
CLASS CLASS 
D.S. D.S. 


Figure € - Memcry Representation of Program (Input) 
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Assume the following is the “result offeetnhe S576 1» e 


Irocess: 


READ STUDENT 
REPEAT 
I zt 
REAL STUD_GRADE[T] 
UNTIL STUD_GRADE[I] = 999 
The programmer now attempts to verify tne hypctheses against 
the code. Tbe READ STUDENT line stands alone as 
verification of the hypothesis that each student is input. 
To verify the two hypotheses associated with grades is 
slightly more complicated. The READ STUL GRATE[I] statement 
would be adequate to verify the hypothesis that student 
grades were input. Hcwever, it falls to confirm that itas 
a numerical representation. To contirr nS, if no 
declaration statement exists, the programmer must analyse 
the behavior of the variable. The code resulting fror the 
slicing process based on input is itself sliced, this time 
on STUD GRADIÍ[I]. The UNTIL STUD_GRATE(I] = 999 statement 
tecores tre only other line in the slice. 

The programmer recognizes the UNTIL statement as a 
compare and branch Operation and notes that the variable is 
compared to a numter. His or rer knowledge of the 
progrenmming language iS extensive enovgh to realize thet 999 


rust be a number and nct a string. Also, he or she Knows 
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that if a number is compared to anything tut enother nurber, 
a “type risratch/ occurs. Therefore, STUD GRADE (1] rust be 
a nurber. This verifies tne first slot of the frare. 

The REPEŁAT-UNTIL block of the original slice is 
recognized as a looping construct. This, coupled witb the 
fact that one variable inside the loop is used as an index, 
allows the programmer to chunk the block as  "PUIID AN 
ARRAY”. This chunk is associated with the grade input and, 
based on this context, the inforration cluster associated 
with the grade data structure is processed. It is found to 
ovicuudiem the class Cf array data Structures, ande So she 
second slot and its ccrresponding hypothesis is also 
verified. With all code now mapped, the entire input 
representation is considered verified, as all higher level 
hypotheses inherit the verification. Alsc, with reference 
to the last verificaticn, it should te ncted that the 
informaticn cluster and hypothesis were further refined to 
reflect that a particular class, the erray cless, of list 
structures vas used. 

If a contradiction does occur in verification, e walk up 
the subtree takes [lace. Each hypothesis is checked until 
one is found which the irfcrration does not contradict. A 
rew hypothesis is formed at the next lower level as a 
Pet nemenmbeot this hypothesis, and all hypotheses below this 
New 6 eeamen reevaluated based. on the new context. A similar 


process takes frlace if information, other than that 


exrected, is found and needs to be ircluded in the 
representation. Obvicusiy, the higher vp the tree the 
change takes place, the greater the remory reorganization 
necessary. 

Up to this point, the pregrammer has teen forringe the 
prograr representation using a top-down approach. However, 
there are times when a bottom-up indvctive approech is also 
necessary. Usually this approach is taken when e 
prograrrer's knowledge base, regarding the task dcmaíin, is 
incomplete, oor when atypical algorivhrs are used. He ners 
where chunking plays a major role. The purpose of this next 
exarple is to demonstrate this role, and not to describe, In 
detail the indrctive process. 

Suppese the programmer is confronted with a module or 
block of code that he cr she bas formed nc hypothesis abcut 
at a Specific level. Using the grade averaging eazanple, 
assure that the prcgrarrer has no knowledge cf how averages 
are ccmputed, and that the algorithm used is unkncwn to him 
or her. The pregramrer now tries to understand the 
algcrithm by inductively reascning abcut the code based on 
his or her knowledge of lower level functicns perforred 
within TT: 

At the lowest level, this is accomplished by looking at 
individual lines of code and assigning then interpretations 
[Ref. 12]. However, because the experts knowledge base 


cortains information about constructs and their uses, 
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certain 


the performance of a specific function. 


beacons’. 


cf these lines are recognized as ccde 


included in 


FRCCES cells these 


The block of code is for a Standard avereging routine: 


I 
SUM 


"n e 


e 


WHILE STUDI GRAIE(I] < > 999 DO 


= 


SU SUM * S1UD GRADEII] 
I * 1 


— il 


END WHILE 


AVERAGE = SUM / I 


The programmer analysing this code reccgnizes the first 


lines as assignment 


individually. He 


recognizes 
The 


furctional uses. 


assignment variatle on both sides of the equel sign, 


is interpreted as changing the 


sore operation on it, rather 


value. Cnce the value added 


value, the prcgrammer chunks 


kncwledge base information 


variable added to that 


indicates 


are chunked as "SUM STUDENT GRADES”. 


líres 
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Statements, 


next assignment statement 


which shows 


tyre 


an array summaticn process. 


twWC 
and interprets ther 
or she now looks at the WHIIE line and 


it as a looping construct and reacon for severel 


hes the 


and so 


valve of SUM by jperformine 


then simply essizning it e 


is recognized as ap indexed 


the loop. He or she Nes 


that an indexed 


of assignrent statement 


So these four lames 
Also, 


the first two 


are now chunked as VARIARLF INITIALIZATICN” based on 


this new intorration. The last line is interpreted as an 
assignment statement which computes the grade average by 
dividing the sum of the grades by the nunber of grades 
summed. 

By chunking, the programmer hes taken a piece of code, 
which covid te considered a single chunk which ‘COMPUTES 
GRADE AVERAGES , and formed a representation through 
inductive reasoning. The original seven lines of code can 
now be interpreted as: 

- Initialize variables 

- Sur grades 

- Divide sum ty number of grades summed 
This representation can stay in short term memory to be used 
for the present task, being linked tc the representation cf 
the rest of the pregrar in long term wemory, and/or can be 
used to learn an averaging algoritim which could then be 
used for other tasks as well. And, once learned, the 


representation could be added to thet in long term remory. 
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VI. RECOMMENDATIONS 


= eee > X “e a NEM JN I I OM de a 


This Study has presented a theoretical model of simple 
ccgnitive processes developed and used by prcgrammers. 
Further, tte study has attempted to demonstrate how the 
exrert, by using these processes, gains an in-depth 
understanding of complex programs. pls Unreal. at 
present, to fully test these ideas because methodologies 
have nct been developed in the behavicral sciences to do 
this. Also, the requisite size end corplexity of the 
programs, and the time invclved, are prohibitive. Research 
and the results of limited testing on small scale progrars, 
hcwever, do suggest certain design techniques, and coding 
and documentation methods which directly ¡influence the 
effectiveness of tnese prccesses. 

Cne such area iS code structure, which should be 
designed so as to suggest chunks to anyone etterpting to 
comprehend it [Rer. 13: pg. 175]. Functicnal elerents of 
the code should te irplerented as contiguous tlocks of text 
whenever possitle. Arbitrary (GC10%s and forward and 
tackward JUMPs should be avoided. Control flow statements 
shculd be used to direct flow from the exit pcint of one 
chunk TO the entry point ot others. All these 


considerations enhance the chunking prccess ty making blocks 


of code recognizable as single functions. This results in 
raking it easier to use the text of the program as an 
external memory fcr those chunks. 

Tests conducted by WEISER also indicated that code 
structure influences slicing [Ret. 1£]. It was found that a 
ruch higher degree of slicing, among 21 expert prcgrammers, 
took place when analysing a poorly structured program with 
indiscriminate vse of GOTO’s and non-mrnemcnic variable names 
than when analysing prcgrars whick rake use of moduler 
designs, mnemcnics, and ccmments. The value of proper use 
of mnemonics and comments to tbe slicing rrccess is that 
they serve tc explicitly show data flow and to group 
associated statements and functions. This lessens the need 
for prcegrammers to ferret out this information. One can 
conclude that less effort was required to achieve an equal 
level of understanding when good programming techniques were 
employed. The use of these maxirizes the effjÓ ectiveness of 
slicing while rinirizing the effort necessary. 

Comments and mnemonics are also helpful to the chunking 
process. A well placed corrent, specifying the purpose of a 
block of code, and perbaps tbe data elerents affected, 
explicitly identifies a functional chunk. This chunk could 
then easily be encoded based on the corrent alone, 
Eliminating the need for code analysis at that point. 
Meaningful mneronics wovld give secre insight into their 


purpose and thus both aid the recognition ard chunking of 


complex data Structures and nelt to forn COMPEC E 
hypotheses. These could then be incorporated into still 
larger chunks, allowing the many date elements which make up 
the structure to be processed as a single elerent in reror;. 

Prograr documentetion can be, itself, a wealth cf 
information for the expert programmer. À naturel lengvege 
explanation of the apprcach taken in criginally designing 
the software facilitates the formulation of a fairly 
accurate hypothesis regarding its implementation. Citing 
explicitly the algorithms erployed enables verification of 
certain byrpotheses without extensive code analysis. Using 
this information, the maintainer can more easily focus on 
certain functions or behaviors of the code wlthout having to 
first analyse it in depth to determine the specifics of its 
implementation. If exceptions to standard algorithmic 
coding are noted, it saves the programmer fror having to 
determine why it was coded in such a way. Also, 12 subtle 
effects of the code are included in the docurentation, alcng 
with certain potentials for side effects, it would reduce 
the testing necessary when a modification is made. 

One final area which positively affects the use of these 
processes is Stendardizetion on all levels. Use of a 
Standard design methodology would allow prcgramrers to learn 
how to best chunk and slice certain representative softwere 


formats. ‘Beacons identifying certain functicnal areas 


coula be learned and used effectively. Automatic tools to 
aid these processes could also be developed with less 
difficulty. 

On a more specific level, standardizaticn cf algorithms, 
and their corresponding constructs would greatly simplify 
the task of comprehension. Experts would te able to 
incorporate these into their knowldege bases, learning ther 
trem both the functional and the behavioral pcints of view. 
Also, coding templates could be learned and associated with 
these, aiding recognition of code itself. 

Similar ideas have teen used in rost other engineering 
fields with great success. While software engineering is 
not, in many respects, as rigorous as these other 
disciplines, standards could be rade flexible enough so es 
not to inhibit progress. Software reuseatility is the 
motivation for recently generated interest in this area. 
The programming language ADA is the first ster lin an atterpt 
at achieving some of this standardization, and its use in 
conjunction with these processes ray serve to verify their 


validity. 
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